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Abstract. We describe an approach to modelling and reasoning about data-centric 
business processes and present a form of general model checking. Our technique 
extends existing approaches, which explore systems only from concrete initial 
states. 

Specifically, we model business processes in terms of smaller fragments, whose 
possible interactions are constrained by first-order logic formulae. In turn, pro- 
cess fragments are connected graphs annotated with instructions to modify data. 
Correctness properties concerning the evolution of data with respect to processes 
can be stated in a first-order branching-time logic over built-in theories, such as 
linear integer arithmetic, records and arrays. 

Solving general model checking problems over this logic is considerably harder 
than model checking when a concrete initial state is given. To this end, we present 
a tableau procedure that reduces these model checking problems to first-order 
logic over arithmetic. The resulting proof obligations are passed on to appropriate 
"off-the-shelf" theorem provers. We also detail our modelling approach, describe 
the reasoning components and report on first experiments. 

1 Introduction 

Data is becoming increasingly important to large organisations, both private enterprises 
and large government departments. Recent headlines on "big data" (cf. [7|) suggest 
that many organisations manage unprecedented amounts of structured data, and that 
worldwide, the volume of information processed by machines and humans doubles ap- 
proximately every two years. Organisations need to be able to organise and process data 
according to their defined business processes, and according to business rules that may 
further specify properties of the processed data. 

Unfortunately, most approaches to business process modelling do not adequately 
support the analysis of the complex interactions and dependencies that exist between an 
organisation's processes and data. Although they may support process analysis, help- 
ing users find and remove errors in their models, most fall short when the processes 
are closely tied to structured data. The reasons for this are specific to the concrete for- 
malism used for the analyses, but can normally be traced back to the fact that classical 
propositional logic or discrete Petri-nets are used. Neither of these can adequately rep- 
resent structured data and the operations on it. In other words, these tools' analyses 
make coarse abstractions of the data, and instead focus mostly on the correctness of 
workflows. 
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The business artifact approach, initially outlined in |8 1, was one of the first to tackle 
this issue. It systematically elevates data to be a "first-class citizen", while still offering 
automated support for process analysis. Its cornerstones are artifacts, which are records 
of data values that can change over time due to the modifications performed by services, 
which are formalised using first-order logic. Process analysis is provided, essentially, 
by means of model checking. That is, the following question is answered automatically: 
given some artifact model, a database providing initial values, and a correctness prop- 
erty in terms of a first-order linear-time temporal logic formula (called LTL-FO), do all 
possible artifact changes over time satisfy the correctness property? For the constraints 
given in [5|, this problem is always decidable. 

In this paper, we present an approach to modelling and reasoning about data-centric 
business processes, which is similar to this work, but which offers reasoning support that 
goes beyond that work's "concrete model checking". Our approach is based on process 
fragments that describe specific tasks of a larger process, as well as constraints for limit- 
ing the interactions between the fragments. As such it is also inspired by what is known 
as declarative business process modelling (9), meaning that users do not have to create 
a single, large transition system containing all possible task interleavings. Instead, users 
can create many small process fragments whose interconnections are governed by rules 
that determine which executions are permitted. 

In our framework, those rules are given by first-order temporal logic. Unlike [5 1, we 
choose to extend CTL*, i.e., a branching time logic, rather than LTL, since process frag- 
ments are essentially annotated graphs and CTL* is, arguably, an appropriate formalism 
to express its properties (cf. [3]). Our database is given in terms of JSON objects j4|, 
enriched by a custom, static type system which models and preserves the type informa- 
tion of any input data. Process fragments may modify data, and one can easily state and 
answer the concrete model checking problem as outlined above. 

However, our approach also works if one does not start with an initial concrete 
database; that is, we intend to not only check whether it is possible to, reach a bad state 
(e.g., a set of data for which no process fragment is applicable) from some given state 
(i.e., the initial set of data), but also to determine whether for any set of data a bad state 
can be reached. In other words, we support what we call generic model checking. As 
the domains of many data items are infinite (e.g., any item of type integer), this problem 
is considerably harder, in fact, generally undecidable. 

Informally, the two reasoning problems we are interested in are: 

Concrete data model checking problem: Given a specification S, a database s Q , and 
a CTL*(FO) formula <Z>. Does (s , S) \= <t> hold? 

Unrestricted model checking problem: Given a specification S and a CTL*(FO) for- 
mula <t>. For every database sq, does (so, S) \= <t> hold? 

As will become clear below, a specification is comprised of a process model, logical 
definitions, and constraints to combine process fragments. The relation (sq,S) |= <t> 
means that the pair (so, S) satisfies the query 0. See Section|4]for the precise semantics. 

Without any further restrictions, both problems are not even semi-decidable. This 
can be seen, e.g., by reduction from the domain-emptyness problem of 2-register ma- 
chines. Hence, practical approaches need to work with restrictions to recover more 
pleasant complexity properties. 
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Process model: 




guard = "~acceptable(db)" 
script = "db . status . final = true'' 



entry = true 
exit = "true' 



entry 


= "true" 


final 


= "true" 



Paid 



Shipped 



Completed 



e 6 



en 



e 9 



Invoice 



guard = "db. status. paid <> true" 
script = "db. status, paid = true" 



Definitions: 

completed: Vs:Status . (completed(s) <=> (s.paid = true A s. shipped = true)) 
accepted: Vdb:DB . (acceptable(db) o (-iisEmpty(db. order))) 
ready ToShip: Vs:Status . (ready ToShip(s) o (isEmpty(s.open))) . . . 

Constraints: 

nongold: (db.gold = false => (db. status. shipped = false Wdb. status. paid = true)) 



Fig. 1. Model of a purchase order system as process fragments and definitions. 



The rest of the paper is structured as follows. In Section [2] we present a running 
example. In Section[3]we explain the way we handle the rich data of our models: with 
JSON values, a special type system for those values, and a sorted first order logic for 
further constraining and describing those values. This much covers business rules; in 
SectionlU we describe how we can model processes. When processes (actually process 
fragments) combine with rules, we get what we call specifications. In Section [5] we 
describe the tableau-based model checking algorithm that is used to decide user queries 
of the two sorts identified above. Section [6] discusses how we have implemented our 
technology, and describes some experimental results. Finally, we conclude in Section|7] 

2 A Running Example: Purchase Order 

In this section, we introduce a simplified model of a purchase order system using pro- 
cess fragments. The purpose of the modelled system is to accept incoming purchase 
orders and process them further (packing, shipping, etc.), or to decline them straight 
away if there are problems. The whole model is depicted as a graph in Fig. [T] where 
the biggest process fragment is on the left, with further atomic fragments beside it (la- 
belled Paid, Shipped, and Completed, respectively). Both process tasks, represented 
as nodes in the graph, and connections are typically annotated with extra information. 
Node annotations determine whether or not a node is an initial and/or a final node, an 
entry and/or an exit node. This information is used to constrain the ways in fragments 
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can connect. Edges can carry a guard given as a formula and a simple program written 
in the programming language Groovy. The purpose of the program (given in the field 
"script") is to modify the underlying database, which is referred to by the variable db. 

The depicted system model has one initial node, Init, where it waits for a purchase 
order to arrive. Then, the system can either start to pack (i.e., enter node Pack), or 
decline the order [i.e., enter node Declined). An order can be declined if the guard 
(-iacceptable(db)) in the annotation of edge ei is satisfied. The predicate acceptable is 
defined in the Definitions section of our input specification. In a nutshell, the sections 
Definitions and Constraints contain domain-knowledge, encoded as logical rules. (The 
constraint named "nongold" states that non-gold customers must pay before shipment; 
W is the "weak until" operator.) 

If the order is not declined, an attempt will be made to pack its constituents. If all 
are in stock, the process will continue to the node Packed. However, if one or more 
items are missing, they need to be ordered in, which is expressed in the loop between 
the nodes Pack and Stocktake. 

Informally, process fragments are linked together as follows. Starting from a state 
comprised of an init node and a given initial database, an outgoing transition from the 
current state can only be executed if it satisfies the transition's guard. If it is satisfied, 
the associated program is executed to determine the new value of the database, and the 
edge's target node becomes the new current state. The entry and exit annotations im- 
pose implicit constraints on how fragments can be combined: the execution of a new 
process fragment must always start with its entry nod^H coming from an exit node. In 
other words, there are implicit transitions between all exit and all entry nodes. However, 
if a guard is associated to an entry node, this guard sits on all its implicit incoming tran- 
sitions. The computation stops if from the current state no successor can be reached, 
either because there is no outgoing edge, the guards of all outgoing edges are not satis- 
fied by the current state, or a depth limit has been reached. 

In our example, two possible sequences are Init — > Declined, or Init — > Pack —>...—> 
Invoice — > Paid — > Shipped— > Completed. It is not required to cover all fragments, as 
illustrated by the first run. 

The database which can be modified by the programs given in the "script" annota- 
tions, is represented as a JSON object. See, for example, left hand side of Fig. [2] (The 
right hand side contains type definitions for the JSON data, see also Sec. 0) The pro- 
gram annotated on edge e^, which leads into node Declined, simply sets the field final 
inside status to true. Crucial for our example is the list of open items, under status, 
which has to be empty to be able to ship a purchase order. If it is not, constituents of the 
order are missing and need to be ordered until the list is empty. 

As for sample queries consider the CTL*(FO) formula -i(E F db .status .final = true), 
which can be seen as a planning goal. The runs on the model above that falsify it lead to 
a database db that has reached a "final" state, with status. final being set to true. Planning 
queries are useful, e.g., for flexible process configuration from fragments during run- 
time. Another interesting query is A G (Vs:S tock . (s e db. stock => s.available > 0)). It 
is a safety property, saying that at all stages in the process run, and for all possible stock 
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{ 



"order" 

"gold" 

"stock" 



[1], 
true, 



DB 



{ order: List [Integer] , 
gold: Bool, 
stock: List [Stock], 
status: Status } 



[ { "ident" : "Mouse", 
"price" : 10, 
"available" : }, 



{ "ident" : "Monitor", 
"price" : 200, 
"available" : 2 }, 



Stock = { ident: String, 
price: Integer, 
available: Integer } 



{ "ident" : "Computer", 
"price" : 1000, 
"available" : 4 } ] , 



Status = { open: List [Integer] , 



value: Integer, 
shipping: Integer, 
paid: Bool, 
shipped: Bool, 
final: Bool 



"status" 



{ "open" : [] , 
"value" : 0, 
"shipping" : 0, 
"paid" : false, 
"shipped" : false, 
"final" : false } 



} 



} 



Fig. 2. Left: Example database as JSON document. Right: JSON type constraints. 

items, the number of available items is non-negative. Such queries are typical during 
design time, and pose an unrestricted model checking problem. 



Faithful modelling of business processes requires being able to model the objects (or 
data) manipulated by the processes and, of course, their evolution over time. In this 
section we focus on data modelling, which is based on JSON extended with a type 
system. 

JSON 1 4 1 is simple, standardised, textual data representation format. In addition to a 
standard set of atomic values such as integers and strings, JSON supports two structur- 
ing techniques: sequencing ("arrays") and arbitrarily nested hierarchies (through "ob- 
jects"). Our choice of JSON (rather than XML, say), is based on the ease with which 
it can be written and understood by humans. JSON is sufficiently rich to be a plausible 
format for representing the data used in business processes, and its human ease-of-use 
is extremely helpful. 

Other than simply being the medium in which data is represented, there are two 
important functions that JSON must support. Firstly, it must be possible to manipulate 
JSON values in the course of executing a specification. This functionality is realised 
through the use of the Groovy programming notation. 

Secondly, it must be possible to express logical predicates over JSON values, both 
to guard process transitions and to pick out certain forms of value that are of interest. 
In particular, if a specification is to achieve a particular end-goal, with a database being 
in a particular configuration, we need to be able to describe how the various values in 
that database inter-relate. It is this that motivates our choice of the logically expressive 
capabilities of first order logic, together with sorts such as lists and numbers. 

In addition to first-order predicates, we also use a simple type system over JSON 
values. This provides a simple mapping into the sorts of our underlying first-order logic. 



3 Modelling Data With JSON Logic 
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We note that the type system is indispensable for unrestricted model checking, in order 
to derive from it logical axioms for object and array manipulations. 

3.1 A Type System for JSON 

First we briefly summarise the syntax that is fully described in the IETF RFC (4): 
JSON values can be numbers, booleans (true and false), strings (written between 
double-quotes, e.g., "a string") and a special value null. JSON's arrays are written 
as comma-separated values between square-brackets, e.g., [1, "string", [true]]. 
JSON objects are similar to records or structures in languages such as Pascal and C. 
They are written as lists of field-name/value pairs between braces. Both forms are illus- 
trated in Figure [2] 

Sibling field-names within an object should be unique, and are considered un- 
ordered. Therefore, an object can be thought of as a finite map from field-names to 
further JSON values. Following this conception, we write Ob j {v/} to denote an object 
whose field names are the domain of finite map vf, with field s's value being vf(s). 

JSON does not impose any restrictions on the structure of values. For example, a 
list may contain both strings and integers. However, we choose to restrict this freedom 
with a simple type system comparable to those in third-generation languages such as C. 
Let JSON types be denoted by r, t', ti etc., then 

t ::= Integer | Bool | String | List[-r] | Option[r] | ObjTy{(/'} | EnumTy [s/] 

where tf is a finite map from strings to types, and si is a list of strings. 

The Option and EnumTy types are the only ones that do not have a obvious connec- 
tion back to a set of JSON values. The Option type is used to allow for values that are 
not necessarily always initialised, but which come to acquire values as a process pro- 
gresses. We do not expect to see the option-constructor occur with multiple nestings, 
e.g., a type such as Option[Option[String]]. The EnumTy type is used to model finite 
enumerated types, where each value is represented by one of the strings in the provided 
list. This flexibility in the type system allows for more natural modeling. 

Values are assigned types with the following inductive relation, where we write v : t 
to indicate that JSON value v has type t, where the meta-variables i and s correspond 
to all possible integer and string values respectively, and where we use e e t to mean 
that element e is a member of list I: 

true : Bool false : Bool ;' : Integer s : String 

s e si v : t 

s : EnumTy[.s7] null : Option[r] v : Option[r] 

Vv 6 els, v : r 
[els] : Listfr] 

dom(v/) = dom(//) Vs e dom(vf). vf(s) : tf(s) 
Obj{v/} :ObjTy{//-} 

This type system is simple and designed to be pragmatic. Meta-theoretically, it is not 
particularly elegant. In particular, values may have multiple types: if a value v is of type 
t, then it is also of type Option[r]; string values are not just of type String, but also 
have an arbitrary number of possible enumeration types. 
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3.2 From JSON to First-Order Logic 



When a user develops a business specification, we expect them to name the various 
types of interest with the type system above. When concrete initial values are given for 
a concrete model-checking problem, we use that type system to check that these values 
really do have the appropriate type. The same system is used to ensure that logical 
guards and goal-conditions are sensible, as discussed below. It also plays a pivotal role 
in our reasoning procedure for the unrestricted model checking problem, which requires 
to reflect the semantics of a JSON type model in many-sorted first-order logic. We are 
going to describe that now. 

We fix a non-empty set S of sorts and a first-order logic signature E comprised of 
function and predicate symbols of given arities over S . We assume infinite supplies of 
variables, one for every sort in S . A constant is a 0-ary function symbol. The (well- 
sorted Z"-)terms and atoms are defined as usual. We assume E contains a predicate 
symbol * s (equality) of arity s x s, for every sort s e S . Equational atoms, or just 
equations, are written infix, usually without the subscript s, as in 1 + 1 » 2. We write 
(f>[x] to indicate that every free variable in the formula <p is among the list x of variables, 
and we write <p[t] for the formula obtained from (f>[x] by replacing all its free variables 
x by the corresponding terms in the list t. 

We assume a sufficiently rich set of Boolean connectives (such as {-i, A }) and the 
quantifiers V and 3. The well-sorted E -formulas, or just (FO) formulas are defined as 
usual. We are particularly interested in signatures containing (linear) integer arithmetic. 
For that, we reserve the sort symbol Z, the constants 0, +1, +2, . . ., the function symbols 
+ and -, and the predicate symbol >, each of the expected arity over Z. 

The semantics of our logic is the usual one: a E -interpretation I consists of non- 
empty, disjoint sets, called domains, one for each sort in S . We require that the domain 
for Z is the set of integers, and that every arithmetic function and predicate symbol 
is mapped to its obvious function over the integers. A (variable) assignment a is a 
mapping from the variables into their corresponding domains. Given a formula cf> and 
a pair (I, a) we say that (I, a) satisfies <t>, and write (I, a) \= 0, iff <t> evaluates to 
true under / and a in the usual sense (the component a is needed to evaluate the free 
variables in <t>). If <t> is closed then a is irrelevant and we can write / (= <t> instead of 
(/, a) |= <t>. We say that a closed sentence <t> is valid (satisfiable) iff / |= <t> for all (some) 
interpretations /. 

In order to map our JSON modelling framework to FOL we let the sorts S contain 
all the defined type names in the JSON type model of the given specification. In the 
example in Section [2] these are DB, Stock and Status. Without loss of generality we 
assume that the top-level type in a JSON type model is always called DB@ We call 
any JSON term of type DB a database. See again Section [2] for an example. We fix a 
dedicated variable db of sort DB. Informally, db will be used to hold the database at the 
current time point. 

Furthermore, we must provide mappings into FOL from terms that are specific to 
JSON. In some sense, both JSON's arrays and its objects are generic "arrays", values 

2 We need additional sorts, e.g., for truth values and integers, as mentioned. The sorts in S are 
written in italics, as in DB. 
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that can be seen as collections of independently addressable components. The JSON 
syntax for that is a usual one: a [i] , denotes the value of the i th element of array a; and 
ob j . fid, denotes the value of ob j 's field called fid. These are the accessor operations. 
Their FOL representation (as terms) is indexia, i) and fldiobj), respectively. 

This mapping allows to formulate predicates on JSON data in FOL. For exam- 
ple, the guard db . status . paid <> true in Sec. [2js example maps to the formula 
paid(status{db)) + true. We also support updator operations for both arrays and ob- 
jects. For arrays, we have update(a, i, v), which denotes an array that is everywhere the 
same as a except that at index i it has value v. For objects, we have analogous updator 
functions per field. If an object type had fields f ldl, f ld2 etc., we would then have the 
term updJidl(obj, v), denoting an object everywhere the same as obj except with value 
v for its field f ldl. We note that these mappings can be automated without effort. With 
field and array updators to hand, we can translate a model's scripts (Groovy fragments 
on graph-edges) into a logical form. This translation is to a term of one free variable db, 
denoting the effect of that script on db. 

Because standard FOL theorem provers do not natively support the theory of ar- 
rays and objects, we generate suitable FOL axioms from the given JSON type model. 
For arrays, the appropriate axioms are well-known and for objects, there are analogous 
axioms. For example, fldl(upd_fldl(obj, v)) = v, and fld2(updjldl {obj, v)) = fld2(obj). 

In addition, we have concrete syntax for writing complete values {e.g., [2,4,6] 
for a list of three elements), though this is actually just syntactic sugar for a chain 
of updates over some underlying base object. In particular, any database has a (FOL) 
term representation, called "database as a term" below. Moreover, this same term lan- 
guage allows us to give partial specifications of filled databases. For example, the term 
upd-gold(db, true) stands for a (any) database represented by the constant db whose 
gold field holds the value true, with the other fields arbitrary. Indeed, analysing such 
partially filled databases is one of the main goals of our research agenda. 

4 Modelling Processes 

In this section we describe our framework for modelling processes. As said earlier, it is 
centered around the notion of process fragments that manipulate databases over time. 
The cooperation of the fragments is described by ( temporal ) constraints. All constraints 
and guards in state transitions may refer to user-specified predicates on (components of) 
the database, which we call (logical) definitions here. We will introduce these compo- 
nents now. 

4.1 Process Fragments 

A guard yu is a FOL formula with free variables at most {db}; an update term u is a FOL 
term with free variables at most {db}. By Guard (Update) we denote the set of all guards 
(update terms); GProg is the set of all Groovy programs. Without further formalization 
we assume the Groovy programs are "sensible" and describe database updates that can 
be characterized as update terms. 
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A process fragment F is directed labeled graph (N, E, A N , A E ), where A' is a set of 
nodes, E Q N x N is a set of edges, A E : E h> Guard x GProg x Update is an edge 
labeling function, and A N : N ^ 2 {init - entr 5'' exit|uGuard is an edge labeling function. 

The informal semantics of process fragments has been given in Section [2] already. 
The precise semantics of a set of process fragments is given by first translating it into 
one single process model P and then defining the semantics of P in terms of its runs. 

More formally, a process (model) P is a quadruple (N, no, E, A E ) where N, E and A E 
are as above and no € N is the initial node. Suppose as given a set T — {F\, . . ., F^} 
of process fragments, for some k > 1, where Ft = (N,, A^, A E ) and N{ and Nj are 
disjoint, for all i + j. Suppose further, without loss of generality, that exactly one node 
in [Ji<i<kNi is labeled as an init node. Let no be that node. The process model P = 
(so, no, N, E, A E ) associated to T is defined as follows: 

N = Ui<i<kNi E = (Ui</<* Ed U E + A E — (Ui<i<* Af) U A + 

where (e denotes the empty Groovy program) 

E + = {(m, n) | m e Nj, n e Nj, exit € A H '(m) and entry € A N '(n), for some 1 < i, j < k} 
A + - {(m, ri) i-> (y, e, db) \ (m, ri) E E + and {entry, y] c A N '(n), for some 1 < j < k] 

For the above construction to be well-defined we require that every entry node in every 
fragment Fj is also labeled with a guard y (which could be T). 

4.2 Definitions and Constraints 

Definitions are logical abbreviations. As such, they are not semantically necessary. 
Nonetheless, just as in mathematics, they are a crucial aid in the construction and com- 
prehensibility of useful models. Formally, a definition (for p) is a closed formula of the 
form Vx:s . p(x) o <p[x] where x is list of variables of sorts s c S , p is a predicate 
symbol of the proper arity, and is a formula. 

Constraints specify how process fragments can be combined. The idea has been pur- 
sued before, e.g., in the Declare system |9| which uses propositional (linear) temporal 
logic for that. In order to take data into account, we work with a fragment of CTL* over 
first-logic, which we refer to as CTL*(FO). The syntax of our CTL*(FO) state formulae 
is given by <t> ::= ( \ \ <P A<t> \ Ai/r | Etff, where £ is a FO formula with free variables 
at most {db}, and if/ a path formula defined via ::= | -ty \ \$i A iff | Xifi \ Xifi \ ifi[}ifi. 
(The operator X is "weak next".) A constraint then is simply a state formula. Notice 
that because constraints may contain the free variable db, our logic is not obtained from 
propositional CTL* by replacing propositional variables by closed formulas. 

Figure[T]contains some examples of definitions and constraints. 

4.3 Specifications and Semantics 

The modelling components describing so far are combined into specifications. For- 
mally, a specification S is a tuple (P, D, C) where P is a process, T> is a set of defi- 
nitions and C is a set of constraints. An instance I (of S) is a pair (so, S), where so is a 
database (as a term) and S is a specification. 
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We are now in the position to provide a formal definition for the model checking 
problems stated in the introduction. Let S = (P, D, C) be as above, where P is of the 
form (N, no, E, A E ) and <f> a state formula with free variables at most \db], the query. 

As a first step to define the satisfaction relation (so,S) |= (f> between an instance 
and a query we make the constraints C part of the query. Assume <p is given in negation 
normal form (this is always possible) and that it starts with a path quantifier (E or A). 
The expanded query <p c is the formula A (C => if/) if = A if/, for some formula if/, and 
it is E (C A if/) if <p — E if/. Here, C is read as a conjunction of its elements. (The rationale 
for this definition is that the desired treatments of constraints is indicated by the path 
quantifier in the query.) Notice that with <p also <p c is a query. Now define (so, S) \= <p 
iff (so,P,D) |= (f> c , '•<?., the triple (so, P,D) satisfies (f> C - It remains to define the latter 
satisfaction relation, which we turn to now. 

y,u 

As a convenience, we say that P contains a transition m — > n. if (m, n) € E and 
A E (m, n) = (y, u), for some guard y and Groovy program u as an update term. 

A run r (of(P, D))from so is a possibly infinite sequence (no, *o)(wi, Si)(i2, $2) • ■ ■ 
of pairs of nodes and databases, also called states, such that (i) V contains transitions 

of the form («,■ — > n,+i) , (ii) y,[s,] and (iii) s i+l = m,[s,]. In item (i) in 

case i = the node «o is meant to be the initial node no in V. Notice that in item (ii) 
the definitions D play the role of axioms from which the instantiated guard y,[s,] is to 
follow. Occasionally the nodes in a run are not important, and we confuse a run with its 
projection on the states sos\ S2 • ■ • ■ 

For a run r = (n , so)("i> *i)( n 2» s 2) • ■ ■ an d i > we define r[i] = («,■, s t ), sometimes 
also r[i] = s t . By r' we denote the truncated run (r,, S;)(r i+ i, s i+l ) ■ ■ ■ , by \r\ the number 
of elements in the run or 00, if r is, in fact, infinite. Obviously, r° = r. 

For any formula <f> e CTL*(FO) with free variables at most {db} we define (so, P, D) |= 
(f> as follows: 

( So ,p,v)\=e iff 1= (o => 

(so, P, D) \= -1^ iff (so, P,D)\=t// is not true 
(so, P, D) \= fa A if/2 iff (so, P, D) \= #1 and (s , P, D) (= ^2 
(s , D) |= Ai/r iff (P, D,r)\=ij/ for all runs r starting in no 
(so, P, D) \= Ei// iff (P, D,r)\=if/ for some run r starting in n , 

where the relation (P, £), r) |= ip is defined as 

(P,D,r)\=0 iff (s , P,D)\= & 

(P, D, f) |= -.1// iff (P, £), r) |= if/' is not true 

(P, D, r) \= if/[ A if/' 2 iff (!P, D, r) |= ^ and (P, £), r) |= ^ 

(f , £>, r) |= X^' iff \r\ > 1 and (P, D, r 1 ) |= ^ 

(f , £>, r) |= X^' iff \r\ < 1, or |r| > 1 and (P, D, r 1 ) |= 

(!P, £), r) |= iA'j iff there exists a ; > 0, such that \r\ > j and (P, D, r j ) \= if/' 2 , 

and (P, D, r') |= if/\ for all 0<i<j 
(P, D, r) \= if/[ Rif/' 2 iff (P, D, r') |= if/' 2 for all i < \r\, or there exists a j > 0, such that 

M > (P, D, rJ) \= if/[ and (P, D, r') \= if/\ for all < i < j. 

We further assume the usual "syntactic sugar", such as V, => (implies), G (always), 
F (eventually), or W (weak until) operators, which can easily be defined in terms of 
the above set of operators in the expected way. Note that we distinguish a strong next 
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operator, X, from a weak next operator, X as described in [1|. This gives rise to the 
following equivalences: ipR0 = A (i// V Xi[rR0) and if/\J0 = V iff A X^IM as 
one can easily verify by using the above semantics. This choice is motivated by our 
bounded model checking algorithm, which has to evaluate CTL*(FO) formulae over 
finite traces as opposed to infinite ones. For example, when evaluating a safety formula, 
such as Gifr, we want a trace of length n that satisfies iff in all positions i < n to be a 
model of said formula. On the other hand, if there is no position i < n, such that is 
satisfied, we don't want this trace to be a model for Ft// . This is achieved in our logic as 
Gif/ = if/ A XGiA and Fi// = i//V XFif/ hold. Note also that -.Xt/r + X-.^, but -.X^ = X-.^. 

5 Reasoning with Tableaux for CTL*(FO) 

Tableau calculi for temporal logics have been considered for a long time |6] e.g.] as 
an appropriate and natural reasoning procedure. There is also a version for proposi- 
tional CTL* ifTTI . However, we are not aware of a first-order logic tableaux calculus 
that accommodates our requirements, hence we devise one, see below. We note that 
we circumvent the difficult problem of loop detection by working in a bounded model 
checking setting, where runs are artificially terminated when they become too long. 

Suppose we want to solve an unrestricted model checking problem, i.e., to show that 
(so, P, D) \= <pc holds, for every database sq. As usual with tableau calculi, this is done 
by attempting to construct a countermodel for the negation of this statement. The uni- 
versally quantified database so then becomes a Skolem constant, say, db, representing 
an (unknown) initial database. A state then is a pair of the form («, w[db]) where n e N 
and M[db] is an update term instantiated with that initial database. We find it convenient 
to formulate the calculus' inference rules as operators on (sets of) sequents. A sequent 
is an expression of the form s \-q <t> where s is a state, Q e {E, A} is a path quantifier, 
and <f>[db] is a (possible empty) set of CTL*(FO) formulas in negation normal form 
with free variables at most {db}. When we write s \-q <p,<P we mean s \-q {<p} U 0. 

The informal semantics of a sequent («,M[db]) \-q 0[db] is "some run of the in- 
stance (db, P, D) has reached the state (n, u[db]) and (n, «[db]) (= Q <P[u[db]]". 

A tableau calculus, the calculus below derives trees that represent disjunctions of 
conjunctions of formulas. More precisely, the nodes are labeled with sets of sequents 
that are read conjunctively, and sibling nodes are connected disjunctively. The purpose 
of the calculus' inference rules is to analyse a given sequent by breaking up the formu- 
las in the sequent according to their boolean operators, path quantifiers and temporal 
operators. An additional implicit and/or structure is given by reading the formulas <t> in 
s He <t> conjunctively, and reading the formulas <t> in s i-a disjunctively. The reason is 
that A does not distribute over "or" and E does not distribute over "and". 

We need some more definitions to formulate the calculus. A formula is classical iff 
it contains no path quantifer and no temporal operator. A formula is a modal atom iff 
its top-level operator is a path quantifer or a temporal operator. A sequent s \-q is 
classical if all formulas in are classical. 

A tableau node is a (possibly empty) set of sequents, denoted by the letter Z. We 
often write <x; E instead of {cr} U E. We simply speak of "nodes" instead of "tableau 
nodes" if confusion with the nodes in graphs is unlikely. 
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Let (p c be a given expended query and S a specification as introduced before. The 
initial sequent is the sequent sq He -><pc, where so = (no, db) is the initial state, for 
some fresh constant db. Notice that the expanded query is negated, corresponding to 
the intuition of attempting to compute a countermodel for the negation of the expanded 
query. 

Because we are adopting a standard notion of tableau derivations it suffices to define 
the inference rules. (The root node contains the initial sequent only.) The components 
P and D are left implicit below. 

Boolean rules. The implicit reading of <t> as disjunctions/conjunctions in a Ha/He se- 
quent sanction the following rules. 

s hE Ad/,<P;Z _ s h E <f> V il/, <t>;Z 

E-A - — E-v 



s\-£<f>,il/, 0;E s \-e (/),<&; Z s if/, 0,2 

s i-a0v if/,0;E s i-A <f> a iff, 0; £ 

A-v A-A 



s ha <f>, if/, 0; Z s i-A 4>, 0; s hA if/, 0; I 

if (p is not classical or if/ is not classical (no need to break classical formulas apart). 

Rules to separate classical sequents. The following rules separate away the classical 
formulas from the modal atoms in <t>. Every classical sequent can be passed on to a 
first-order theorem prover; if the result is "unsatisfiable" then the node is closed. 

s \-e&;Z s hA <1>;Z 

E-Split — A-Split 



s\- E r[u[db\];s\-E&\r;Z s h A r[u[db]];Z sh A &\r,Z 

if s = (n, M[db]) for some n, T consists of all classical formulas in 0, r[u[db]] is 
obtained from r by replacing every free occurence of the variable db in all its formulas 
by w[db], and r + and /Mdb]] * <t>. 

The left rule exploits the equivalence E(<f> A <P) = E(f> A E& if (f> is classical, and the 
right rule exploits the equivalence A(0 V 0) = A<f> V k<t> if (p is classical. 

Rules for path quantifiers. The next rules eliminate path quantifiers, where Q e {E, A). 

sVeQ^\^ aci . s^ k Q<p,0;S 

E-Ehm A-Ehm 



s Vq <f>; s i-E <P;Z s\-Q<p;Z s ha 

The soundness of the left rule follows from the equivalences E(Q<f> A &) = EQ<p A 
E0 = Q(f> A E 0, and the soundness of the right rule follows from the equivalences 
A (20V 0) = A<P = Q(f>W A0. 

The above rules apply also if is empty. Notice that in this case represents the 
empty conjunction in s 0, which is equivalent to T, and the empty disjunction in 
s hA 0, which is equivalent to _L. 

When applied exhaustively, the rules so far lead to sequents that all have the form 
s Vq such that (a) consists of classical formulas only, or (b) consists of modal 
atoms only with top-level operators from {U, R, X, X}. 
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Rules to expand U and R formulas. The following rules perform one-step expansions 
of modal atoms with U and R operators. 

s h e (0 U 0;Z s v Q (<f> R i/0, 0-Z 

U-Exp R-Exp 



s hQ if/V ((/> AX((/>Uif/)),<t>;Z s h e (if/ A (0 V X(0Rf))), <£;i7 

When applied exhaustively, the rules so far lead to sequents that all have the form s \-q 
such that (a) consists of classical formulas only, or consists of modal atoms only 
with top-level operators from {X, X). 

Rules to simplify X and X formulas. Below we define inference rules for one-step ex- 
pansions of sequents of the form s \-q X<p and \-q Xip. The following inference rules 
prepare their application. 

jh E X0i,...,X0 n ,X>i,...,X> m ;i; 

E-X-Simp 



sl-E Y((pi A • ■ • A (f>„ A tf/\ A • • • A tf/ m ); E 



if n + m > 0, where Y = X if n = else Y = X. Intuitively, if just one of the modal atoms 
in the premise is an X-formula then a successor state must exist to satisfy it, hence the 
X-formula in the conclusion. Similarly: 

s h A X0 1 ,...,X0„,X> 1 ,...,X> m ;i; 

A-X-Simp 



s h A Y(4>i V • • • V <f) n V if/i V • ■ • V ijj m );I 



if n + m > 0, where Y = X if m = else Y = X. 

The correctness of this rule follows from the equivalences A (X <p V Xtfr) = A (X <f> V 
X>) = AX(0Vl/r). 

To summarize, with the rules so far, all sequents can be brought into one of the 
following forms: (a) s Vq r, where r consists of classical formulas only, (b) s Vq X (/>, 
or (c) s Vq X(f>. 

Rule to close branches. The following rule derives no conclusions and this way indi- 
cates that a branch in a tableau is "closed". 

Unsat * hg ' 0i; -" ;Sn VQ «* n 



if every 0, consists of closed classical formulas, and /\(Dl)&i U- • -U<?>„) is unsatisfiable 
(not satisfiable). 

Rules to expand X and X formulas. 

(m,t) V E X(p;Z 



E-X-Exp 



(ii , Mi [f]) l-E 7i [f] A (j>\E ■■■ (n k ,u k [t])i- E j k [t]A(P;Z (m, t) h E -, ri [/] a • • ■ A ^ k [t];Z 



Ji,Ui 

if there is a k > such that m — » n, are all transitions in P emerging from m, where 
1 < i < k. 

This rule binds the variable db in the guards to the term t, which represents the 
current database, while it leaves the formula <f> untouched. The variable db in X refers 
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to the databases in the successor states, i.e., the databases «,-[/]. The rales to separate 
classical sequents above will bind db in <t> correctly. 

There is also a rale E-X-Exp whose premise sequent is made with the X operator 
instead of X. It differs from the E-X-Exp rule only by leaving away the rightmost con- 
clusion. We do not display it here for space reasons. We note that both rules are defined 
also if k = 0. 

A-X-Exp im, t) , M ;E 

(ti\,u\[i\) h A -171M V0; •••(«*, u k [t]) h A -inM V0;(m,O h E yiM V • • • V y k [t];Z 

if there is a k > such that m — > n, are all transitions in P emerging from m, where 
1 < i < k. 

This rule will for each of the conclusion sequent lead to a case distinction (via 
branching) whether the guard of a transition is true or not. Only if the guard is true the 
transition must be taken. The conclusion sequent (m, t) I-e Ji [t] V • • • V jk[t] forces that 
at least one guard is true. Analogously to above, there is also a rule A-X-Exp for the X 
case, which does not include this sequent. This reflects that X formulas are true in states 
without successor. 

Both rules also work as expected if k = 0: for A-X-Exp the formula in the sequent 
(m, f) I-e y\[t] V • • • V jk[t] is equivalent to ± (false); for A-X-Exp the premise sequent 
is deleted. If additionally E is empty then the result is a node with the empty set of 
sequents. This does not indicate branch closure; branch closure is indicated by deriving 
no conclusions, not a unit-conclusion, even if empty. 

This concludes the presentation of the tableau calculus. As said above, we enforce 
derivations to be finite by imposing a user-specified maximal length on the number of 
state transitions it executes. This is realized as a check in the rules to expand X and 
X formulas by pretending a value k = of transitions emerging from the node of the 
considered state, if the run to that state becomes too long. (This is not formalized above.) 

For this bounded model checking setting we obtain a formal soundness and com- 
pleteness result for the (hence, bounded) unrestricted model checking problem. More 
precisely, given a specification S = (P,D,C), (sq,S) |= <t> holds for every database So 
relative to all runs of maximal length shorter than a given finite length I if and only if the 
fully expanded tableau with initial node (no, db) i-e 4>c is closed. (A tableau is closed if 
each of its leafs is closed as determined by the Unsat rule or the £-X-Exp rule.) 

The Unsat tableau rule requires a call to a (sound) first-order theorem prover. De- 
pending on the underlying syntactic fragment of FOL these calls may not always termi- 
nate. However, if a classical sequent is provably satisfiable then it is possible to extract 
from the tableaux branch a run that constitutes a counterexample to the given problem. 
Moreover, this formula will often represent general conditions on the initial database so 
under which the query <P is not satisfied by (so, S) and this way provide more valuable 
feedback than a fully concrete database. 

6 Practice and Experiments 

In this section, we provide some notes on the implementations of the theory presented 
in the preceding sections. 
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Satisfiability Checking on the Nodes. Before we can model-check the truth of formulas 
over the graph structure of a full specification, we must be able to evaluate first-order 
formulas with respect to nodes within that graph. When performing checking with a 
concrete initial state, all subsequent states will be concrete as well, and evaluating quan- 
tified formulas is straightforward as long as quantification is over finite domains, as is 
typical. On the other hand, if the initial state is only characterised with a formula, then 
checking satisfiability of formulas with respect to that node and all its successors be- 
comes a full-blown theorem-proving problem. 

We solve this problem by translating to the standard TPTP format [ 13 1, which has 
recently be extended to include arithmetic [ 12 1, and then using off-the-shelf first-order 
provers. Our current backend is SPASS+T ifTOl . which has good support for arithmetic 
in addition to sorted first-order logic. 

Model Checking. For concrete model checking, we assume that there are no two defini- 
tions for same predicate symbol, that definitions are not recursive, and that all quantifi- 
cations inside the bodies <p range over concrete data items. With these assumptions, def- 
initions can be expanded as necessary, and we can efficiently decide if formulas (edges' 
guards and the classical sub-formulas of the model checking problem) are satisfied with 
respect to concrete database values. In theory, SPASS+T should do the same, but we 
have found that our own custom guard evaluator performs better, and is also guaranteed 
to terminate. When performing concrete model checking, we can also execute scripts 
directly as Groovy programs rather than needing to manipulate them as first order terms. 

We have fully implemented the preceding section's generic tableau system for con- 
crete model checking, giving us an efficient procedure that is guaranteed to terminate on 
problems given a depth-bound. In our practical experiments on the example in Section|2] 
we could (dis)prove queries like the ones mentioned there in very short time. 

Our implementation is also capable of generating proof obligations in the TPTP 
format for unbounded model checking. It also emits the necessary axioms to reflect the 
semantics of objects and arrays, as explained in Section[3] We have experimented with 
smaller examples and found that SPASS+T is capable of handling them. At the current 
stage, however, the implementation is not mature enough yet, and so our experiments 
are too premature to report on. We also plan to consider alternatives to SPASS+T by 
implementing the calculus in |2| and by linking in SMT-solvers. 

7 Conclusions and Future Work 

We described a novel approach to modelling and reasoning about data-centric busi- 
ness processes. Our modelling language treats data, process fragments, constraints and 
logical definitions of business rules on a par. Our research plan focuses on providing 
strong analytical capabilities on the corresponding models by taking all these compo- 
nents into account. The main ambition is to go beyond model checking from concrete 
initial states. To this end we have devised a novel tableau calculus that reduces what we 
called unrestricted model checking problems to first-order logic over arithmetic. 

Our main contributions so far are conceptual in nature. Our main theoretical result 
is the soundness and completeness of the tableau calculus, as explained at the end of 
Section|U Our implementation is already fully functional for concrete model checking. 
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Much remains to be done, at various levels. The tableau implementation needs to be 
completed and improved for efficiency, and more experiments need to be carried out. 

The main motivation for using JSON and Groovy is their widespread acceptance 
in practice and available tool support, which we exploit in our implementation. For the 
same reason we want to extend our modelling language by front-ends for established 
business process modeling techniques, in particular BPMN. This raises (also) some 
non-trivial interesting theoretical issues. For example, how to map BPMN's parallel- 
And construct into our framework. We expect that by using process fragments and con- 
straints on them an isomorphic mapping is possible. 
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